Overview

Dataset statistics

Number of variables13
Number of observations2969
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory301.7 KiB
Average record size in memory104.0 B

Variable types

Numeric13

Warnings

gross_revenue is highly correlated with qtde_invoices and 1 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 1 other fieldsHigh correlation
qtde_products is highly correlated with qtde_invoicesHigh correlation
avg_ticket is highly correlated with qtde_returns and 1 other fieldsHigh correlation
qtde_returns is highly correlated with avg_ticketHigh correlation
avg_basket_size is highly correlated with avg_ticketHigh correlation
gross_revenue is highly correlated with qtde_invoices and 3 other fieldsHigh correlation
recency_days is highly correlated with qtde_invoicesHigh correlation
qtde_invoices is highly correlated with gross_revenue and 3 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 3 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 3 other fieldsHigh correlation
avg_ticket is highly correlated with avg_unique_basket_sizeHigh correlation
avg_recency_days is highly correlated with frequencyHigh correlation
frequency is highly correlated with avg_recency_daysHigh correlation
avg_basket_size is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_unique_basket_size is highly correlated with qtde_products and 1 other fieldsHigh correlation
gross_revenue is highly correlated with qtde_invoices and 2 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 3 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_recency_days is highly correlated with frequencyHigh correlation
frequency is highly correlated with avg_recency_daysHigh correlation
avg_basket_size is highly correlated with qtde_itemsHigh correlation
avg_unique_basket_size is highly correlated with avg_basket_sizeHigh correlation
gross_revenue is highly correlated with qtde_invoices and 4 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 3 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_unique_basket_size and 4 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 3 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 4 other fieldsHigh correlation
avg_ticket is highly correlated with avg_basket_size and 1 other fieldsHigh correlation
qtde_returns is highly correlated with gross_revenue and 5 other fieldsHigh correlation
avg_ticket is highly skewed (γ1 = 25.1612653) Skewed
frequency is highly skewed (γ1 = 24.88065538) Skewed
qtde_returns is highly skewed (γ1 = 21.97906809) Skewed
df_index has unique values Unique
customer_id has unique values Unique
avg_ticket has unique values Unique
recency_days has 33 (1.1%) zeros Zeros
qtde_returns has 1481 (49.9%) zeros Zeros

Reproduction

Analysis started2021-06-19 18:38:48.331254
Analysis finished2021-06-19 18:39:06.835707
Duration18.5 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2317.239475
Minimum0
Maximum5715
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:06.917464image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile185.4
Q1929
median2120
Q33537
95-th percentile5035.2
Maximum5715
Range5715
Interquartile range (IQR)2608

Descriptive statistics

Standard deviation1554.914732
Coefficient of variation (CV)0.6710203022
Kurtosis-1.010612562
Mean2317.239475
Median Absolute Deviation (MAD)1271
Skewness0.3423730333
Sum6879884
Variance2417759.825
MonotonicityStrictly increasing
2021-06-19T15:39:07.034989image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
6071
 
< 0.1%
5971
 
< 0.1%
26461
 
< 0.1%
5991
 
< 0.1%
26481
 
< 0.1%
6011
 
< 0.1%
6031
 
< 0.1%
51441
 
< 0.1%
6051
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
57151
< 0.1%
56961
< 0.1%
56861
< 0.1%
56801
< 0.1%
56591
< 0.1%
56551
< 0.1%
56491
< 0.1%
56381
< 0.1%
56371
< 0.1%
56271
< 0.1%

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15270.32233
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:07.148791image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12619.4
Q113799
median15220
Q316768
95-th percentile17964.6
Maximum18287
Range5940
Interquartile range (IQR)2969

Descriptive statistics

Standard deviation1718.857469
Coefficient of variation (CV)0.1125619638
Kurtosis-1.205579452
Mean15270.32233
Median Absolute Deviation (MAD)1489
Skewness0.03229421811
Sum45337587
Variance2954470.998
MonotonicityNot monotonic
2021-06-19T15:39:07.254297image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
163841
 
< 0.1%
181641
 
< 0.1%
129331
 
< 0.1%
129351
 
< 0.1%
149841
 
< 0.1%
170331
 
< 0.1%
137041
 
< 0.1%
129391
 
< 0.1%
170371
 
< 0.1%
141251
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
123471
< 0.1%
123481
< 0.1%
123521
< 0.1%
123561
< 0.1%
123581
< 0.1%
123591
< 0.1%
123601
< 0.1%
123621
< 0.1%
123641
< 0.1%
123701
< 0.1%
ValueCountFrequency (%)
182871
< 0.1%
182831
< 0.1%
182821
< 0.1%
182771
< 0.1%
182761
< 0.1%
182741
< 0.1%
182731
< 0.1%
182721
< 0.1%
182701
< 0.1%
182691
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2963
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2692.872654
Minimum6.2
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:07.364132image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum6.2
5-th percentile229.77
Q1570.96
median1086.92
Q32306.52
95-th percentile7166.028
Maximum279138.02
Range279131.82
Interquartile range (IQR)1735.56

Descriptive statistics

Standard deviation10133.6576
Coefficient of variation (CV)3.763140299
Kurtosis397.4504813
Mean2692.872654
Median Absolute Deviation (MAD)672.58
Skewness17.63866
Sum7995138.91
Variance102691016.5
MonotonicityNot monotonic
2021-06-19T15:39:07.467055image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
745.062
 
0.1%
3312
 
0.1%
734.942
 
0.1%
379.652
 
0.1%
533.332
 
0.1%
731.92
 
0.1%
889.931
 
< 0.1%
471.511
 
< 0.1%
13375.871
 
< 0.1%
284.461
 
< 0.1%
Other values (2953)2953
99.5%
ValueCountFrequency (%)
6.21
< 0.1%
13.31
< 0.1%
151
< 0.1%
36.561
< 0.1%
451
< 0.1%
521
< 0.1%
52.21
< 0.1%
52.21
< 0.1%
62.431
< 0.1%
68.841
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
140438.721
< 0.1%
124564.531
< 0.1%
117375.631
< 0.1%
91062.381
< 0.1%
72882.091
< 0.1%
66653.561
< 0.1%
65019.621
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct272
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.33614011
Minimum0
Maximum373
Zeros33
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:07.575768image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q111
median31
Q381
95-th percentile242
Maximum373
Range373
Interquartile range (IQR)70

Descriptive statistics

Standard deviation77.75995124
Coefficient of variation (CV)1.208651173
Kurtosis2.772678799
Mean64.33614011
Median Absolute Deviation (MAD)26
Skewness1.796815372
Sum191014
Variance6046.610018
MonotonicityNot monotonic
2021-06-19T15:39:07.691880image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
199
 
3.3%
487
 
2.9%
285
 
2.9%
385
 
2.9%
876
 
2.6%
1067
 
2.3%
766
 
2.2%
966
 
2.2%
1764
 
2.2%
2255
 
1.9%
Other values (262)2219
74.7%
ValueCountFrequency (%)
033
 
1.1%
199
3.3%
285
2.9%
385
2.9%
487
2.9%
543
1.4%
766
2.2%
876
2.6%
966
2.2%
1067
2.3%
ValueCountFrequency (%)
3732
0.1%
3724
0.1%
3711
 
< 0.1%
3681
 
< 0.1%
3664
0.1%
3652
0.1%
3641
 
< 0.1%
3601
 
< 0.1%
3591
 
< 0.1%
3584
0.1%

qtde_invoices
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.723139104
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:07.806376image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile17
Maximum206
Range205
Interquartile range (IQR)4

Descriptive statistics

Standard deviation8.85653132
Coefficient of variation (CV)1.547495379
Kurtosis190.8344494
Mean5.723139104
Median Absolute Deviation (MAD)2
Skewness10.76680458
Sum16992
Variance78.43814702
MonotonicityNot monotonic
2021-06-19T15:39:07.915658image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2785
26.4%
3499
16.8%
4393
13.2%
5237
 
8.0%
1190
 
6.4%
6173
 
5.8%
7138
 
4.6%
898
 
3.3%
969
 
2.3%
1055
 
1.9%
Other values (46)332
11.2%
ValueCountFrequency (%)
1190
 
6.4%
2785
26.4%
3499
16.8%
4393
13.2%
5237
 
8.0%
6173
 
5.8%
7138
 
4.6%
898
 
3.3%
969
 
2.3%
1055
 
1.9%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
0.1%
861
< 0.1%
721
< 0.1%
622
0.1%
601
< 0.1%
571
< 0.1%

qtde_items
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1664
Distinct (%)56.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1579.345234
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:08.035376image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile101.4
Q1296
median637
Q31398
95-th percentile4403
Maximum196844
Range196843
Interquartile range (IQR)1102

Descriptive statistics

Standard deviation5699.60463
Coefficient of variation (CV)3.608840238
Kurtosis518.2901316
Mean1579.345234
Median Absolute Deviation (MAD)418
Skewness18.7632595
Sum4689076
Variance32485492.94
MonotonicityNot monotonic
2021-06-19T15:39:08.146680image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31011
 
0.4%
889
 
0.3%
1509
 
0.3%
2888
 
0.3%
848
 
0.3%
2728
 
0.3%
2468
 
0.3%
2608
 
0.3%
1147
 
0.2%
3947
 
0.2%
Other values (1654)2886
97.2%
ValueCountFrequency (%)
11
< 0.1%
22
0.1%
122
0.1%
161
< 0.1%
171
< 0.1%
181
< 0.1%
191
< 0.1%
201
< 0.1%
231
< 0.1%
251
< 0.1%
ValueCountFrequency (%)
1968441
< 0.1%
799631
< 0.1%
773731
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
628121
< 0.1%
582431
< 0.1%
577851
< 0.1%
502551
< 0.1%

qtde_products
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct469
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.7234759
Minimum1
Maximum7837
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:08.263747image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q129
median67
Q3135
95-th percentile382
Maximum7837
Range7836
Interquartile range (IQR)106

Descriptive statistics

Standard deviation269.8357454
Coefficient of variation (CV)2.198729651
Kurtosis354.8662639
Mean122.7234759
Median Absolute Deviation (MAD)44
Skewness15.70705355
Sum364366
Variance72811.32951
MonotonicityNot monotonic
2021-06-19T15:39:08.374333image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2845
 
1.5%
2038
 
1.3%
3535
 
1.2%
1933
 
1.1%
1533
 
1.1%
2933
 
1.1%
1132
 
1.1%
2631
 
1.0%
2730
 
1.0%
1629
 
1.0%
Other values (459)2630
88.6%
ValueCountFrequency (%)
16
 
0.2%
214
0.5%
315
0.5%
417
0.6%
526
0.9%
629
1.0%
718
0.6%
819
0.6%
927
0.9%
1027
0.9%
ValueCountFrequency (%)
78371
< 0.1%
56701
< 0.1%
50951
< 0.1%
45771
< 0.1%
26981
< 0.1%
23791
< 0.1%
20601
< 0.1%
18181
< 0.1%
16731
< 0.1%
16361
< 0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.99228915
Minimum2.150588235
Maximum4453.43
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:08.490650image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2.150588235
5-th percentile4.916661099
Q113.11933333
median17.97438356
Q324.97962963
95-th percentile89.991
Maximum4453.43
Range4451.279412
Interquartile range (IQR)11.8602963

Descriptive statistics

Standard deviation119.5119039
Coefficient of variation (CV)3.622419267
Kurtosis813.2414935
Mean32.99228915
Median Absolute Deviation (MAD)5.981867838
Skewness25.1612653
Sum97954.1065
Variance14283.09517
MonotonicityNot monotonic
2021-06-19T15:39:08.598994image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17.492758621
 
< 0.1%
9.4182926831
 
< 0.1%
28.899687941
 
< 0.1%
46.074130431
 
< 0.1%
25.775384621
 
< 0.1%
8.7451724141
 
< 0.1%
18.150615381
 
< 0.1%
17.943444441
 
< 0.1%
15.98451
 
< 0.1%
43.21921
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
2.1505882351
< 0.1%
2.43251
< 0.1%
2.4623711341
< 0.1%
2.5112413791
< 0.1%
2.5153333331
< 0.1%
2.651
< 0.1%
2.6569318181
< 0.1%
2.7075982531
< 0.1%
2.7606215721
< 0.1%
2.7704641911
< 0.1%
ValueCountFrequency (%)
4453.431
< 0.1%
3202.921
< 0.1%
1687.21
< 0.1%
952.98751
< 0.1%
872.131
< 0.1%
841.02144931
< 0.1%
651.16833331
< 0.1%
6401
< 0.1%
624.41
< 0.1%
615.751
< 0.1%

avg_recency_days
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1258
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.29203894
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:08.705508image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q125.92857143
median48.25
Q385.33333333
95-th percentile200.6
Maximum366
Range365
Interquartile range (IQR)59.4047619

Descriptive statistics

Standard deviation63.49652007
Coefficient of variation (CV)0.9435963164
Kurtosis4.911076124
Mean67.29203894
Median Absolute Deviation (MAD)26.25
Skewness2.066722809
Sum199790.0636
Variance4031.808061
MonotonicityNot monotonic
2021-06-19T15:39:08.829244image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1425
 
0.8%
422
 
0.7%
7021
 
0.7%
720
 
0.7%
3519
 
0.6%
4918
 
0.6%
4617
 
0.6%
1117
 
0.6%
2117
 
0.6%
116
 
0.5%
Other values (1248)2777
93.5%
ValueCountFrequency (%)
116
0.5%
1.51
 
< 0.1%
213
0.4%
2.51
 
< 0.1%
2.6013986011
 
< 0.1%
315
0.5%
3.3214285711
 
< 0.1%
3.3303571431
 
< 0.1%
3.52
 
0.1%
422
0.7%
ValueCountFrequency (%)
3661
 
< 0.1%
3651
 
< 0.1%
3631
 
< 0.1%
3621
 
< 0.1%
3572
0.1%
3561
 
< 0.1%
3552
0.1%
3521
 
< 0.1%
3512
0.1%
3503
0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1225
Distinct (%)41.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1137995668
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:08.950277image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.008894164194
Q10.01633986928
median0.02590673575
Q30.04941860465
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.03307873537

Descriptive statistics

Standard deviation0.4081552761
Coefficient of variation (CV)3.586615377
Kurtosis989.3740169
Mean0.1137995668
Median Absolute Deviation (MAD)0.01218850234
Skewness24.88065538
Sum337.8709138
Variance0.1665907294
MonotonicityNot monotonic
2021-06-19T15:39:09.061142image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1198
 
6.7%
0.0277777777817
 
0.6%
0.062517
 
0.6%
0.0238095238116
 
0.5%
0.0344827586215
 
0.5%
0.0833333333315
 
0.5%
0.0909090909115
 
0.5%
0.0294117647114
 
0.5%
0.0357142857113
 
0.4%
0.0256410256413
 
0.4%
Other values (1215)2636
88.8%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
0.1%
0.005665722381
 
< 0.1%
0.0056818181822
0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
31
 
< 0.1%
26
 
0.2%
1.1428571431
 
< 0.1%
1198
6.7%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%
0.53
 
0.1%

qtde_returns
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct213
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.8773998
Minimum0
Maximum9014
Zeros1481
Zeros (%)49.9%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:09.175525image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q39
95-th percentile100
Maximum9014
Range9014
Interquartile range (IQR)9

Descriptive statistics

Standard deviation282.8177717
Coefficient of variation (CV)8.10891217
Kurtosis596.4015287
Mean34.8773998
Median Absolute Deviation (MAD)1
Skewness21.97906809
Sum103551
Variance79985.89197
MonotonicityNot monotonic
2021-06-19T15:39:09.280814image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01481
49.9%
1164
 
5.5%
2149
 
5.0%
3105
 
3.5%
489
 
3.0%
678
 
2.6%
561
 
2.1%
1251
 
1.7%
743
 
1.4%
843
 
1.4%
Other values (203)705
23.7%
ValueCountFrequency (%)
01481
49.9%
1164
 
5.5%
2149
 
5.0%
3105
 
3.5%
489
 
3.0%
561
 
2.1%
678
 
2.6%
743
 
1.4%
843
 
1.4%
941
 
1.4%
ValueCountFrequency (%)
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%
17761
< 0.1%
15941
< 0.1%

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1973
Distinct (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean235.7641026
Minimum1
Maximum6009.333333
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:09.388675image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile44
Q1103.25
median172
Q3281.3333333
95-th percentile598.28
Maximum6009.333333
Range6008.333333
Interquartile range (IQR)178.0833333

Descriptive statistics

Standard deviation283.6790682
Coefficient of variation (CV)1.203232659
Kurtosis103.1079516
Mean235.7641026
Median Absolute Deviation (MAD)82.5
Skewness7.718835873
Sum699983.6206
Variance80473.81372
MonotonicityNot monotonic
2021-06-19T15:39:09.494872image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10011
 
0.4%
11410
 
0.3%
829
 
0.3%
869
 
0.3%
739
 
0.3%
1368
 
0.3%
888
 
0.3%
608
 
0.3%
758
 
0.3%
1637
 
0.2%
Other values (1963)2882
97.1%
ValueCountFrequency (%)
12
0.1%
21
< 0.1%
3.3333333331
< 0.1%
5.3333333331
< 0.1%
5.6666666671
< 0.1%
6.1428571431
< 0.1%
7.51
< 0.1%
91
< 0.1%
9.51
< 0.1%
111
< 0.1%
ValueCountFrequency (%)
6009.3333331
< 0.1%
42821
< 0.1%
39061
< 0.1%
3868.651
< 0.1%
28801
< 0.1%
28011
< 0.1%
2733.9444441
< 0.1%
2518.7692311
< 0.1%
2160.3333331
< 0.1%
2082.2258061
< 0.1%

avg_unique_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct910
Distinct (%)30.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.49000174
Minimum0.2
Maximum259
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-06-19T15:39:09.603895image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile2
Q17.666666667
median13.6
Q322
95-th percentile46
Maximum259
Range258.8
Interquartile range (IQR)14.33333333

Descriptive statistics

Standard deviation15.45948697
Coefficient of variation (CV)0.8839042559
Kurtosis29.31413557
Mean17.49000174
Median Absolute Deviation (MAD)6.6
Skewness3.435085166
Sum51927.81516
Variance238.9957374
MonotonicityNot monotonic
2021-06-19T15:39:09.716691image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1343
 
1.4%
942
 
1.4%
1641
 
1.4%
839
 
1.3%
1737
 
1.2%
1437
 
1.2%
1136
 
1.2%
736
 
1.2%
1534
 
1.1%
534
 
1.1%
Other values (900)2590
87.2%
ValueCountFrequency (%)
0.21
 
< 0.1%
0.253
 
0.1%
0.33333333336
0.2%
0.41
 
< 0.1%
0.40909090911
 
< 0.1%
0.512
0.4%
0.54545454551
 
< 0.1%
0.55555555561
 
< 0.1%
0.57142857141
 
< 0.1%
0.61764705881
 
< 0.1%
ValueCountFrequency (%)
2591
< 0.1%
1771
< 0.1%
1481
< 0.1%
1271
< 0.1%
1051
< 0.1%
1041
< 0.1%
1011
< 0.1%
981
< 0.1%
95.51
< 0.1%
94.333333331
< 0.1%

Interactions

2021-06-19T15:38:50.283776image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.395412image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.491576image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.585727image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.681178image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.768974image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.863093image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:50.957306image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.043068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.134432image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.225658image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.311824image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.401768image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.489769image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.577082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.664102image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.750989image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.841152image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:51.926220image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.019006image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.111427image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.195736image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.286232image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.377617image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.605545image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.694538image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.781956image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.869652image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:52.956607image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.042180image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.130687image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.213751image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.305485image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.396743image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.480035image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.573748image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.662819image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.746328image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.834508image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:53.922431image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.026912image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.134084image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.224798image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.318769image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.407280image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.504322image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.600690image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.691690image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.804139image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:54.906206image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.005606image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.103781image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.195865image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.283306image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.368333image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.452401image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.539258image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.620499image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.710078image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.799466image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.881297image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:55.972811image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.060729image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.310078image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.395878image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.480341image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.577818image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.674137image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.770458image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.871789image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:56.970103image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.075204image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.176322image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.270339image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.371582image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.470436image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.563026image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.659496image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.755886image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.853568image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:57.954122image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.052180image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.149637image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.241291image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.341974image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.441696image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.533738image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.631218image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.728620image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.821041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:58.917412image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.013336image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.097899image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.180696image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.263513image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.349094image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.429102image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.518353image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.606400image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.686095image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.771731image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.857451image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:38:59.938196image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.023363image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.105665image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.198060image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.290147image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.381520image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.475769image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.762813image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.860309image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:00.958069image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.048292image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.143683image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.238018image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.327713image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.420505image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.512517image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.606540image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.698417image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.790852image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.885376image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:01.974955image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.072646image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.169528image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.258965image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.353931image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.448192image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.538412image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.632010image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.724924image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.809710image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.892745image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:02.976416image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.062732image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.142720image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.231430image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.319389image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.399586image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.485162image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.571968image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.652362image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.738352image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.821999image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:03.912957image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.003713image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.092659image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.183795image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.272177image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.366338image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.459823image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.545983image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.641183image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.732917image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.819732image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.909450image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:04.999033image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.087459image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.174185image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.260985image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.354722image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.438575image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.531559image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.623996image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.707923image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.799844image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:05.891389image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:06.244419image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-19T15:39:06.333257image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-06-19T15:39:09.819629image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-19T15:39:09.975364image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-19T15:39:10.124513image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-19T15:39:10.271667image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-06-19T15:39:06.511542image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-06-19T15:39:06.753587image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketavg_recency_daysfrequencyqtde_returnsavg_basket_sizeavg_unique_basket_size
00178505391.21372.034.01733.0297.018.15222235.50000017.00000040.050.9705880.617647
11130473232.5956.09.01390.0171.018.90403527.2500000.02830235.0154.44444411.666667
22125836705.382.015.05028.0232.028.90250023.1875000.04032350.0335.2000007.600000
3313748948.2595.05.0439.028.033.86607192.6666670.0179210.087.8000004.800000
4415100876.00333.03.080.03.0292.0000008.6000000.07317122.026.6666670.333333
55152914623.3025.014.02102.0102.045.32647123.2000000.04011529.0150.1428574.357143
66146885630.877.021.03621.0327.017.21978618.3000000.057221399.0172.4285717.047619
77178095411.9116.012.02057.061.088.71983635.7000000.03352041.0171.4166673.833333
881531160767.900.091.038194.02379.025.5434644.1444440.243316474.0419.7142866.230769
99160982005.6387.07.0613.067.029.93477647.6666670.0243900.087.5714294.857143

Last rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketavg_recency_daysfrequencyqtde_returnsavg_basket_sizeavg_unique_basket_size
29595627177271060.2515.01.0645.066.016.0643946.01.0000006.0645.00000066.000000
2960563717232421.522.02.0203.036.011.70888912.00.1538460.0101.50000015.000000
2961563817468137.0010.02.0116.05.027.4000004.00.4000000.058.0000002.500000
2962564913596697.045.02.0406.0166.04.1990367.00.2500000.0203.00000066.500000
29635655148931237.859.02.0799.073.016.9568492.00.6666670.0399.50000036.000000
2964565912479473.2011.01.0382.030.015.7733334.01.00000034.0382.00000030.000000
2965568014126706.137.03.0508.015.047.0753333.00.75000050.0169.3333334.666667
29665686135211092.391.03.0733.0435.02.5112414.50.3000000.0244.333333104.000000
2967569615060301.848.04.0262.0120.02.5153331.02.0000000.065.50000020.000000
2968571512558269.967.01.0196.011.024.5418186.01.000000196.0196.00000011.000000